INDEX
Intro: MacD, The Golden Arches, The American Dream
Reposrt Summary
Explanatory Data Analysis
1. Overview of Data Set
2. Statistical Overview of Data
3. Unique value in the Category
4. Number of Items in Each Category
5. Plotting Food Category having higest and lowest variety of Item
6. Analysis of Variables with Outliers
7. Correlations between Variables
8. Category of Menu Contributing to max% of Cholesterol
9. Item of the Menu Contributing to the max intake of Sodium
10. Food Item which contributes most amount of Saturated Fat
11. Analysis of Nutrients across Category of Macdonalds Menu
12. Analysis of Carbohydrates vs Sugar
13. Ordering Crispy(fried) chiken or Grilled Chicken
Conclusion
MacDonald brthers , Richard and Maurice McDonald american food entrepuneurs started the Orginal Macdonald in 1940 at San Barnadino, Califonia with the Usp of "Speedee Service System".By September 13,1961 McDonald's, under the guidance of Ray Kroc, filed for a trademark on a new logo—an overlapping, double-arched "M" symbol.That Marks the start of the Golden Arches and became a prominent milestone in the capter of "American Dreams".
As of 2021 MacDoland marks its presence in more than 100 countries, serving millions of people everyday. But how did Macdoland become what it is today. The business model is simple, FRANCHISES.
Firstly, Ray Croc had simple moto of "Maintaining Quality","Maintaining Service Consistancy","Maintaining Cleanliness", "Maintaining Value" across all the franchise outlet. He went on to establish the Hamburger University to teach the franchise to main the consistence.
Secondly Macdonald have a incorgible behavior of adapting to the culture of the country. Example India were it amazingly adapted to the vegeterian maintaing the same Quality,Service,cleanliness and Value of the food.
Thirdly, Brand presence "I'm lovin it" is an emotion with people of many country which reminds them of their childhood.
Fourthly, its ability to innovate, 1975 Macdolads invented the drive away for the customers were you do not need to get out of the car to order to take your food.in the mordern world of technology they also have adpated quickly to idea of delivery with better packaging than most of their compititors.
Macdonald according to most analyst is reccesion free. Its actually the 5th largest realestate company in the world.85% of the Macdonalds are owned by people who are willing to take a franchise of the brand name in return of a fee(rent) and Macdonald contract comes with multile clause like vendors from whom you can buy the raw materials even if they are higher price. Thats how one doesnt gets to undertsand the difference of taste of food from outlet to outlet.
Macdonald have a decade of exeprince in buying and selling of plot. So, the person who is franchising the brand name has to open the outlet in a prime location owned and choosed by Macdonalds, in return of which the franchise give Macdonald rent and franchise fee and equipment fee.And if the franchise at a location fails to run the business or draw traffic Macdonald Corporation simply find a new tenant("Franchise") to run the business or sell the land.
The Data set provide is the nutrition value of the all the category of food and beverages sold in MacDonald and all the items under each categories of Food and Beverages.The Data was analysed using Python and Pythons Library.Inference to the Analysis done is also shared along with the graphical representation.
The below analysis will involve the use of Pandas, Numpy, Matplotlip, Seaborn, Scipy, Pyplot and Plotly express.
#IMPORTING LIBRARIES OF PYTHON
import matplotlib as mt
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import pandas as pd
import numpy as np
import scipy.stats as stat
import plotly.express as px
import plotly.graph_objects as go
#import plotly.offline as pyo
#pyo.init_notebook_mode()
import warnings
warnings.filterwarnings('ignore')
sns.set_style('darkgrid')
# IMPORTING THE DATA SET
menu_df=pd.read_csv('Mcdonald .csv')
#UNDERSTANDING THE NUMBER OF COLUMNS AND THE ROWS IN THE DATASET
print("Number of Rows in the data set : {}".format(menu_df.shape[0]))
print("Number of Columns in the data set : {}".format(menu_df.shape[1]))
Number of Rows in the data set : 260 Number of Columns in the data set : 25
#THE BELOW LINE OF CODE WILL HELP TO SEE ALL THE COLUMNS IN THE DATA SET BY SETTING THE MAX DISPLAY TO NUMBER OF COLS
pd.set_option('display.max_columns',25)
pd.set_option('display.max_rows',25)
#PRINTING THE HEAD OF THE DATASET TO GET A PRELIMINARY UNDERSTANDING OF THE COLUMNS AND THE VALUE
menu_df.head()
| Category | Item | Serving Size(in oz) | Serving Size(g) | Calories | Calories from Fat | Total Fat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | Cholesterol | Cholesterol (% Daily Value) | Sodium | Sodium (% Daily Value) | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Breakfast | Egg McMuffin | 4.8 | 136 | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | 260 | 87 | 750 | 31 | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
| 1 | Breakfast | Egg White Delight | 4.8 | 135 | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | 25 | 8 | 770 | 32 | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
| 2 | Breakfast | Sausage McMuffin | 3.9 | 111 | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | 45 | 15 | 780 | 33 | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
| 3 | Breakfast | Sausage McMuffin with Egg | 5.7 | 161 | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | 285 | 95 | 860 | 36 | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
| 4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 | 161 | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | 50 | 16 | 880 | 37 | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
menu_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 260 entries, 0 to 259 Data columns (total 25 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Category 260 non-null object 1 Item 260 non-null object 2 Serving Size(in oz) 260 non-null float64 3 Serving Size(g) 260 non-null object 4 Calories 260 non-null int64 5 Calories from Fat 260 non-null int64 6 Total Fat 260 non-null float64 7 Total Fat (% Daily Value) 260 non-null int64 8 Saturated Fat 260 non-null float64 9 Saturated Fat (% Daily Value) 260 non-null int64 10 Trans Fat 260 non-null float64 11 Cholesterol 260 non-null int64 12 Cholesterol (% Daily Value) 260 non-null int64 13 Sodium 260 non-null int64 14 Sodium (% Daily Value) 260 non-null int64 15 Carbohydrates 260 non-null int64 16 Carbohydrates (% Daily Value) 260 non-null int64 17 Dietary Fiber 260 non-null int64 18 Dietary Fiber (% Daily Value) 260 non-null int64 19 Sugars 260 non-null int64 20 Protein 260 non-null int64 21 Vitamin A (% Daily Value) 260 non-null int64 22 Vitamin C (% Daily Value) 260 non-null int64 23 Calcium (% Daily Value) 260 non-null int64 24 Iron (% Daily Value) 260 non-null int64 dtypes: float64(4), int64(18), object(3) memory usage: 50.9+ KB
INFERENCE
1. There are no Null values in the data set.
2. Only Category and Items are categorical variables and other 22 variables are numerical variable.
3. columns like Total Fat, Cholesterol, etc. have 2 columns where the first one denotes the quantity in grams and “% Daily Value” can tell you if serving of food is high or low in a nutrient.
print('Statistical Overview of the Dataset'.center(75, ' '))
menu_df.describe().T
Statistical Overview of the Dataset
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Serving Size(in oz) | 260.0 | 12.803077 | 7.054481 | 1.0 | 6.775 | 12.0 | 16.00 | 32.0 |
| Calories | 260.0 | 368.269231 | 240.269886 | 0.0 | 210.000 | 340.0 | 500.00 | 1880.0 |
| Calories from Fat | 260.0 | 127.096154 | 127.875914 | 0.0 | 20.000 | 100.0 | 200.00 | 1060.0 |
| Total Fat | 260.0 | 14.165385 | 14.205998 | 0.0 | 2.375 | 11.0 | 22.25 | 118.0 |
| Total Fat (% Daily Value) | 260.0 | 21.815385 | 21.885199 | 0.0 | 3.750 | 17.0 | 35.00 | 182.0 |
| Saturated Fat | 260.0 | 6.007692 | 5.321873 | 0.0 | 1.000 | 5.0 | 10.00 | 20.0 |
| Saturated Fat (% Daily Value) | 260.0 | 29.965385 | 26.639209 | 0.0 | 4.750 | 24.0 | 48.00 | 102.0 |
| Trans Fat | 260.0 | 0.203846 | 0.429133 | 0.0 | 0.000 | 0.0 | 0.00 | 2.5 |
| Cholesterol | 260.0 | 54.942308 | 87.269257 | 0.0 | 5.000 | 35.0 | 65.00 | 575.0 |
| Cholesterol (% Daily Value) | 260.0 | 18.392308 | 29.091653 | 0.0 | 2.000 | 11.0 | 21.25 | 192.0 |
| Sodium | 260.0 | 495.750000 | 577.026323 | 0.0 | 107.500 | 190.0 | 865.00 | 3600.0 |
| Sodium (% Daily Value) | 260.0 | 20.676923 | 24.034954 | 0.0 | 4.750 | 8.0 | 36.25 | 150.0 |
| Carbohydrates | 260.0 | 47.346154 | 28.252232 | 0.0 | 30.000 | 44.0 | 60.00 | 141.0 |
| Carbohydrates (% Daily Value) | 260.0 | 15.780769 | 9.419544 | 0.0 | 10.000 | 15.0 | 20.00 | 47.0 |
| Dietary Fiber | 260.0 | 1.630769 | 1.567717 | 0.0 | 0.000 | 1.0 | 3.00 | 7.0 |
| Dietary Fiber (% Daily Value) | 260.0 | 6.530769 | 6.307057 | 0.0 | 0.000 | 5.0 | 10.00 | 28.0 |
| Sugars | 260.0 | 29.423077 | 28.679797 | 0.0 | 5.750 | 17.5 | 48.00 | 128.0 |
| Protein | 260.0 | 13.338462 | 11.426146 | 0.0 | 4.000 | 12.0 | 19.00 | 87.0 |
| Vitamin A (% Daily Value) | 260.0 | 13.426923 | 24.366381 | 0.0 | 2.000 | 8.0 | 15.00 | 170.0 |
| Vitamin C (% Daily Value) | 260.0 | 8.534615 | 26.345542 | 0.0 | 0.000 | 0.0 | 4.00 | 240.0 |
| Calcium (% Daily Value) | 260.0 | 20.973077 | 17.019953 | 0.0 | 6.000 | 20.0 | 30.00 | 70.0 |
| Iron (% Daily Value) | 260.0 | 7.734615 | 8.723263 | 0.0 | 0.000 | 4.0 | 15.00 | 40.0 |
INFERENCE
1. The above table helps to identify the continuos number columns as the describe function by default only displays the continous number.
2. The table also provides crucial data about the Mean, Standart devition, min,max, 1st quartlie, 2nd quartile and 3rd quartile of the variables
#UNIQUE CATEGORY ITEMS
print("The Unique Category Available in Macdonad are: \n\n {} ".format(menu_df.Category.unique()))
The Unique Category Available in Macdonad are: ['Breakfast' 'Beef & Pork' 'Chicken & Fish' 'Salads' 'Snacks & Sides' 'Desserts' 'Beverages' 'Coffee & Tea' 'Smoothies & Shakes']
Item_count=pd.DataFrame(menu_df['Category'].value_counts()).rename(columns={"Category": "No. of Item"})
Item_count
| No. of Item | |
|---|---|
| Coffee & Tea | 95 |
| Breakfast | 42 |
| Smoothies & Shakes | 28 |
| Chicken & Fish | 27 |
| Beverages | 27 |
| Beef & Pork | 15 |
| Snacks & Sides | 13 |
| Desserts | 7 |
| Salads | 6 |
Item_count.plot(kind="bar",facecolor='green',figsize=(9,7),legend=False);
xticks=plt.xticks(rotation=90,family='serif',fontsize=12)
yticks=plt.yticks(family='serif',fontsize=12)
plt.xlabel('Categories of food served',font='serif',fontsize=15)
plt.ylabel('Count of Items',font='serif',fontsize=15)
plt.title("Count of Item in each Category\n\n",fontsize=20);
INFERENCE( from 4 & 5 )
1. Coffee & Tea have the highest variety of Items.
2. Salad has the least variety of items.So may be people of diet and looking for salads, Macdonalds in not a good place to visit. And vice versa those people are not the targeted audience of Macdonald .
To find the outliers in the variable we will only be using the columns displayed in the Statistics Overview Table as by default describe function returns statistical over view of the columns which contains continous numbers. And Outliers can only be predicted for continous number variable.
Secondly, Analysing variables like Total Fat (% Daily Value), Cholesterol (% Daily Value) etc are derived variables from Total Fat, Cholesterol etc respectly. So there is no pooint in analysing the same as if Total Fat contains outliers, Total Fat (% Daily Value) will also contain outliers
fig,axes=plt.subplots(6,3,figsize=(15,30))
plt.tight_layout(pad=10)
#BOXPLOT FOR SERVING SIZE IN OZ
axes[0,0].set_title('Seving Size',fontsize=20);
sns.boxplot(data=menu_df["Serving Size(in oz)"],ax=axes[0,0]);
#BOXPLOT FOR CALORIES
axes[0,1].set_title('Calories',fontsize=20);
sns.boxplot(data=menu_df["Calories"],ax=axes[0,1]);
#BOXPLOT FOR CALORIES FROM FAT
axes[0,2].set_title('Calories from Fat',fontsize=20);
sns.boxplot(data=menu_df["Calories from Fat"],ax=axes[0,2]);
#BOXPLOT FOR TOTAL FAT
axes[1,0].set_title('Total Fat',fontsize=20);
sns.boxplot(data=menu_df["Total Fat"],ax=axes[1,0]);
#BOXPLOT FOR SATURATED FAT
axes[1,1].set_title('Saturated Fat',fontsize=20);
sns.boxplot(data=menu_df["Saturated Fat"],ax=axes[1,1]);
#BOXPLOT FOR TRANS FAT
axes[1,2].set_title('Trans Fat',fontsize=20);
sns.boxplot(data=menu_df["Trans Fat"],ax=axes[1,2]);
#BOXPLOT FOR CHOLESTEROL
axes[2,0].set_title('Cholesterol',fontsize=20);
sns.boxplot(data=menu_df["Cholesterol"],ax=axes[2,0]);
#BOXPLOT FOR SODIUM
axes[2,1].set_title('Sodium',fontsize=20);
sns.boxplot(data=menu_df["Sodium"],ax=axes[2,1]);
#BOXPLOT FOR CARBOHYDRATE
axes[2,2].set_title('Carbohydrates',fontsize=20);
sns.boxplot(data=menu_df["Carbohydrates"],ax=axes[2,2]);
#BOXPLOT FOR DIETARY FIBER
axes[3,0].set_title('Dietary Fiber',fontsize=20);
sns.boxplot(data=menu_df["Dietary Fiber"],ax=axes[3,0]);
#BOXPLOT FOR SUGAR
axes[3,1].set_title('Sugars',fontsize=20);
sns.boxplot(data=menu_df["Sugars"],ax=axes[3,1]);
#BOXPLOT FOR PROTEIN
axes[3,2].set_title('Protein',fontsize=20);
sns.boxplot(data=menu_df["Protein"],ax=axes[3,2]);
#BOXPLOT FOR VITAMIN A
axes[4,0].set_title('Vitamin A (% Daily Value)',fontsize=20);
sns.boxplot(data=menu_df["Vitamin A (% Daily Value)"],ax=axes[4,0]);
#BOXPLOT FOR VITAMIN C
axes[4,1].set_title('Vitamin C (% Daily Value)',fontsize=20);
sns.boxplot(data=menu_df["Vitamin C (% Daily Value)"],ax=axes[4,1]);
#BOXPLOT FOR CALCIUM
axes[4,2].set_title('Calcium (% Daily Value)',fontsize=20);
sns.boxplot(data=menu_df["Calcium (% Daily Value)"],ax=axes[4,2]);
#BOXPLOT FOR IRON
axes[5,0].set_title('Iron (% Daily Value)',fontsize=20);
sns.boxplot(data=menu_df["Iron (% Daily Value)"],ax=axes[5,0]);
INFERENCE
As per the Box Plot analysis done above the following variable has Outliers:
1. Serving Size
2. Calories
3. Calories from Fat
4. Total Fat
5. Trans Fat
6. Cholesterol
7. Sodium
8. Carbohydrates
9. Sugars
10. Protein
11. Vitamin A
12. Vitamin C
13. Calcium
14. Iron
corr_mat=menu_df.corr()
plt.figure(figsize=(65,40));
plt.xticks(rotation=90,size=50)
plt.yticks(size=50)
# Generates a mask for upper traingle
mask = np.triu(np.ones_like(corr_mat, dtype=bool))
sns.heatmap(corr_mat,
annot=True,
mask = mask,
annot_kws={'size': 40});
The above table shows the correlation between all the variables in the data set but we are generally concernd with Calories, Protein, Fat, Carbohydrate, Sugar and Dietary Fiber. So lets see a consentrated Heatmap to find the correlation betwee the above variable so, that it helps to choose our next Macdonald menu when we are on diet.
cols = ['Calories','Cholesterol','Trans Fat','Sugars','Dietary Fiber']
cm = np.corrcoef(menu_df[cols].values.T)
sns.set(font_scale = 1.5)
hm = sns.heatmap(cm,cbar = True,
annot = True,
annot_kws = {'size':15},
yticklabels = cols,
xticklabels = cols)
table_chol=pd.pivot_table(menu_df,values="Cholesterol (% Daily Value)",index=['Category'])
table_chol
| Cholesterol (% Daily Value) | |
|---|---|
| Category | |
| Beef & Pork | 28.933333 |
| Beverages | 0.185185 |
| Breakfast | 50.952381 |
| Chicken & Fish | 25.222222 |
| Coffee & Tea | 9.378947 |
| Desserts | 4.857143 |
| Salads | 17.333333 |
| Smoothies & Shakes | 14.714286 |
| Snacks & Sides | 6.230769 |
INFERENCE
1. Breakfast Category contribute to most cholestrol. So, when you are entering the Macdolads for a breakfast, think twice !!!
table_sod=pd.pivot_table(menu_df,values="Sodium",index=['Item']).sort_values(by='Sodium',ascending=False)
table_sod.head()
| Sodium | |
|---|---|
| Item | |
| Chicken McNuggets (40 piece) | 3600 |
| Big Breakfast with Hotcakes and Egg Whites (Large Biscuit) | 2290 |
| Big Breakfast with Hotcakes (Large Biscuit) | 2260 |
| Big Breakfast with Hotcakes and Egg Whites (Regular Biscuit) | 2170 |
| Big Breakfast with Hotcakes (Regular Biscuit) | 2150 |
INFERENCE
Chicken McNuggets has the most amount of Sodium.
table_satfat=pd.pivot_table(menu_df,values="Saturated Fat",index=['Item']).sort_values(by='Saturated Fat',ascending=False)
table_satfat.head(10)
| Saturated Fat | |
|---|---|
| Item | |
| McFlurry with M&M’s Candies (Medium) | 20.0 |
| Big Breakfast with Hotcakes (Large Biscuit) | 20.0 |
| Chicken McNuggets (40 piece) | 20.0 |
| Frappé Chocolate Chip (Large) | 20.0 |
| Double Quarter Pounder with Cheese | 19.0 |
| Big Breakfast with Hotcakes (Regular Biscuit) | 19.0 |
| Big Breakfast (Large Biscuit) | 18.0 |
| Frappé Mocha (Large) | 17.0 |
| Frappé Chocolate Chip (Medium) | 17.0 |
| Big Breakfast (Regular Biscuit) | 17.0 |
INFERENCE
The above table 4 Food Item which contributes to maximum number of Saturated Fat are :
1. McFlurry with M&M’s Candies (Medium)
2. Big Breakfast with Hotcakes (Large Biscuit)
3. Chicken McNuggets (40 piece)
4. Frappé Chocolate Chip (Large)
Under the category of major nutrients comes Carbohydrates, Proteins, Fats, Vitamins, Minerals, Dietary fibre, Water.But as all the neccessary variables are not available for us we will be doing the analysis based on Proteins, Total Fat,Vitamin A, Dietary Fiber only.
10.1. Analysis of Proteins
protein=pd.DataFrame(menu_df.groupby('Category')['Protein'].mean())
colors=['gray']*9
colors[3]='Green'
fig = go.Figure(data=[go.Bar(
x=protein.index,
y=protein['Protein'],
marker_color=colors
)])
fig.update_yaxes(title='Average Protein')
fig.update_layout(width=900,height=700,
title={
'text': "Analysis of Protein",
'y':.9,
'x':.5,
'xanchor': 'center',
'yanchor': 'top'})
INFERENCE
Chicken and Fish Category has most amount of Protien.
10.2. Analysis of Total Fat
fat=pd.DataFrame(menu_df.groupby('Category')['Total Fat'].mean())
colors=['gray']*9
colors[2]='Orange'
fig = go.Figure(data=[go.Bar(
x=fat.index,
y=fat['Total Fat'],
marker_color=colors
)])
fig.update_yaxes(title='Avgerage Total Fat')
fig.update_layout(width=900,height=700,
title={
'text': "Analysis of Total Fat",
'y':.9,
'x':.5,
'xanchor': 'center',
'yanchor': 'top'})
INFERENCE
Break Fast Category has most amount of Fat.
10.2. Analysis of Vitamin A (% Daily Value)
Vit=pd.DataFrame(menu_df.groupby('Category')['Vitamin A (% Daily Value)'].mean())
colors=['gray']*9
colors[6]='Skyblue'
fig = go.Figure(data=[go.Bar(
x=Vit.index,
y=Vit['Vitamin A (% Daily Value)'],
marker_color=colors
)])
fig.update_yaxes(title='Avgerage Total Fat')
fig.update_layout(width=900,height=700,
title={
'text': "Analysis of Vitamin A",
'y':.9,
'x':.5,
'xanchor': 'center',
'yanchor': 'top'})
INFERENCE
Salad Category has most amount of Vitamin A.
10.2. Analysis of Dietary Fiber
DtFiber=pd.DataFrame(menu_df.groupby('Category')['Dietary Fiber'].mean())
colors=['gray']*9
colors[6]='Brown'
fig = go.Figure(data=[go.Bar(
x=DtFiber.index,
y=DtFiber['Dietary Fiber'],
marker_color=colors
)])
fig.update_yaxes(title='Avgerage Dietary Fiber')
fig.update_layout(width=900,height=700,
title={
'text': "Analysis of Dietary Fiber",
'y':.9,
'x':.5,
'xanchor': 'center',
'yanchor': 'top'})
INFERENCE
Salad Category has most amount of Dietary Fiber.
fig,axes=plt.subplots(2,2,figsize=(20,20))
plt.tight_layout(pad=8)
axes[0,0].set_title('Category and Protein');
ax1= sns.swarmplot(x="Category", y="Protein", data=menu_df,ax=axes[0,0]);
ax1.set_xticklabels(ax1.get_xticklabels(),rotation=90);
axes[0,1].set_title('Category and Dietary Fiber');
ax2= sns.swarmplot(x="Category", y="Dietary Fiber", data=menu_df,ax=axes[0,1]);
ax2.set_xticklabels(ax1.get_xticklabels(),rotation=90);
axes[1,0].set_title('Category and Vitamnin A');
ax3= sns.swarmplot(x="Category", y="Vitamin A (% Daily Value)", data=menu_df,ax=axes[1,0]);
ax3.set_xticklabels(ax1.get_xticklabels(),rotation=90);
axes[1,1].set_title('Category and Total Fat');
ax4= sns.swarmplot(x="Category", y="Total Fat", data=menu_df,ax=axes[1,1]);
ax4.set_xticklabels(ax1.get_xticklabels(),rotation=90);
INFERENCE
1. From the first plot it is clear that chiken and fish category for the protein has an outleir which is far beyond the concentration of protein around the mean. That means the bar graph(based of mean value is actually not totaaly true). If we consider the spread of data point in beef and pork category with chiken and fish category keeping in mind the chiken and fish category has outliers, then beef and pork category has most number of protein concentration compared to chiken.
2.Total Fat according to the "Category and Total Fat " graph do not show any outliers, so we can take the bar grap data and state that at an average breakfast have the highest amount of fat. Going further deep we are trying to find the second highest contain of fat, that from the bar graph is chiken and fish which is a bit decisive as is the swarmplot clearly shows that the chicken and fish category has outliers.
3. Salad has highest amount of Vitamin A and Dietary fibers is a correct conclusion from teh swarm plot and the barplot as the dat points cannot be taken an absolute outliers.
Difference between Carbohydrates and Sugar-When people eat a food containing carbohydrates, the digestive system breaks down the digestible ones into sugar, which enters the blood. As blood sugar levels rise, the pancreas produces insulin, a hormone that prompts cells to absorb blood sugar for energy or storage.
So according to the above difference Sugar is derivative of Carbohydrate. In that case, if we plot ascattered plot carbohydrates vs sugar will show a relation where when carbohydrates are increasing the sugar will also incerease.
fig=px.scatter(menu_df, x="Carbohydrates", y="Sugars", trendline="ols",color='Carbohydrates')
fig.update_traces(marker_size=5)
fig.update_layout(width=900,height=700,
title={
'text': "Carbohydrate vs Sugar",
'y':.9,
'x':.5,
'xanchor': 'center',
'yanchor': 'top'});
fig.show()
INFERENCE
Defintely the statement before the analysis is true, Sugar is a derivative of Carbohydrate from the above chart.Carbohydrate increase so the sugar contain also increases comparatively
The below analysis figures out whether to order crispy chiken or grilled chiken from a Nutrition point of view on basis of Calories and Protein.
1. On Basis of Calories
#1. Average Calories Calculation for Crispy Chicken
fry = menu_df[menu_df['Item'].str.contains('Crispy Chicken')]
fry_cal = pd.DataFrame({'Item': fry.Item, 'Calories': fry.Calories})
avg_fry_cal = round(fry.Calories.mean(),2)
print("Average calories on Crispy Chicken category is {}".format(avg_fry_cal))
#2. Average Calories Calculation for Grilled Chicken
grilled = menu_df[menu_df['Item'].str.contains('Grilled Chicken')]
grilled_cal = pd.DataFrame({'Item': grilled.Item, 'Calories': grilled.Calories})
avg_grilled_cal = round(grilled.Calories.mean(), 2)
print("Average calories on Grilled Chicken category is {} \n\n".format( avg_grilled_cal))
#plotting a graphy based on the observation
avg_cal = pd.DataFrame({'Categories':['Crispy Chicken', 'Grilled Chicken'],
'Avg Calories': [avg_fry_cal, avg_grilled_cal]})
avg_cal.plot.bar(x = 'Categories', y = 'Avg Calories', figsize=(7,5));
Average calories on Crispy Chicken category is 520.0 Average calories on Grilled Chicken category is 386.92
INFERENCE
So, If someone are looking for high Calories he or she can go for items with Crispy Chicken rather than Grilled Chiken.
2. On Basis of Protien
fry2 = menu_df[menu_df['Item'].str.contains('Crispy Chicken')]
fry2_cal = pd.DataFrame({'Item': fry2.Item, 'Protein': fry2.Calories})
avg_fry2_cal = round(fry2.Protein.mean(),2)
print("Average protein on Crispy Chicken category is {}".format(avg_fry2_cal))
#2. Average Calories Calculation for Grilled Chicken
grilled2 = menu_df[menu_df['Item'].str.contains('Grilled Chicken')]
grilled2_cal = pd.DataFrame({'Item': grilled2.Item, 'Protein': grilled2.Protein})
avg_grilled2_pro = round(grilled2.Protein.mean(), 2)
print("Average protein on Grilled Chicken category is {} \n\n".format( avg_grilled2_pro))
#plotting a graphy based on the observation
avg_pro = pd.DataFrame({'Categories':['Crispy Chicken', 'Grilled Chicken'],
'Avg Protein': [avg_fry2_cal, avg_grilled2_pro]})
avg_pro.plot.bar(x = 'Categories', y = 'Avg Protein', figsize=(7,5));
Average protein on Crispy Chicken category is 24.93 Average protein on Grilled Chicken category is 28.62
INFERENCE
There is no much difference in protien contain of both the chicken category, but if someone is looking for higher protein count they can go for Grilled Chiken.
The Golden Arches Theory states, "Countries with Macdolands did not go to war( or conflict) after the Macdonald was established" which doesnt means that we can derive tha establishing Macdonalds in every country will lead to world peace.
Data is a vast world where we need to think and act and not otherway round of acting and then thinking. WIth the same I would like to conclude my report.